Skip to content

feat: dx improvements for optimization package#139

Closed
andrewklatzke wants to merge 2 commits intoaklatze/AIC-2178/verify-runs-endpointfrom
aklatzke/AIC-2263/sdk-dx-improvements
Closed

feat: dx improvements for optimization package#139
andrewklatzke wants to merge 2 commits intoaklatze/AIC-2178/verify-runs-endpointfrom
aklatzke/AIC-2263/sdk-dx-improvements

Conversation

@andrewklatzke
Copy link
Copy Markdown
Contributor

@andrewklatzke andrewklatzke commented Apr 16, 2026

Requirements

  • I have added test coverage for new or changed functionality
  • I have followed the repository's pull request submission guidelines
  • I have validated my changes against all supported platform versions

Describe the solution you've provided

Improves the developer experience when using the SDK and fixes a bug where the global model was being ignored for judges.

Describe alternatives you've considered

This is a QoL change for folks consuming this SDK method. Weren't really alternatives considered.

Additional context

the TLDR; here is that when implementing this against multiple frameworks I found myself falling into the pattern of specifying the same handler for both agents and judges. Since that's the case, I've updated it so that handle_judge_call is optional and defaults to handle_agent_call if it's not specified. With this change, the optimization config when using an LD-built config is reduced to just this:

OptimizationFromConfigOptions(
    project_key="default",
    handle_agent_call=handle_agent_call,
)

Additionally just adds an is_evaluation flag as the final argument for handle_agent_call so that if you're using the singular method you can still discern which is which if necessary.


Note

Medium Risk
Public callback signatures change (new required boolean arg) and judge model selection logic is altered, which may break downstream integrations or change evaluation behavior if callers relied on per-judge model overrides.

Overview
Improves the optimization SDK callback surface by making handle_judge_call optional (defaults to handle_agent_call) and adding a final is_evaluation: bool argument to both agent and judge handler signatures so a single shared handler can distinguish generation vs. scoring.

Fixes a judge bug where config-provided judge model names could override the globally configured judge_model; judges now always run on the global judge model while still inheriting other judge flag parameters (e.g. temperature/tools). Also generalizes await_if_needed to support any sync/async return type and updates unit tests accordingly.

Reviewed by Cursor Bugbot for commit a386a27. Bugbot is set up for automated code reviews on this repo. Configure here.

@andrewklatzke andrewklatzke requested a review from a team as a code owner April 16, 2026 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants